Nference - R Ules via C Rowdsourcing
The importance of inference rules to semantic applications has long been recognized, and extensive work has been carried out to automatically acquire inference-rule resources. However, despite their potential, the utilization of inference rule resources is currently somewhat limited, in part due to the considerable number of rules which are in fact invalid. A possible solution to this problem is to enhance the quality of these resources by computing an estimate of the accuracy of its rules. Unfortunately, evaluating such resources has turned out to be a non-trivial task, leading to slow progress in the field. In this study, we demonstrate how the quality of an inference-rule resource can be enhanced using rule annotations. For this purpose we propose a framework for evaluating inference-rules using crowdsourcing. Our framework simplifies a previously proposed ‘instance-based evaluation’ method which evaluates a rule, based on a sample of its applications. The new framework eliminates the need for the substantial annotator training previously required, making it suitable for crowdsourcing. We show that our method produces a large amount of annotations with a kappa of 0.78 (considered as ‘substantial agreement’) between the online annotators and our manual annotations, without requiring training expert annotators. Two use cases are presented to demonstrate the utilization of the annotations gathered using the proposed crowdsourcing framework. The first demonstrates how the crowdsourcing framework can be used to evaluate the quality of an inference-rule resource, either by its developers or by potential users of the resource. The second use case sets out to accomplish the main goal of this work, improving the quality of an inference-rule resource, by combing the score given by the rule-learning algorithm with the accuracy according to the crowdsourced rule application annotations.
